Loss Functions For Improved On-Policy Control

نویسندگان

  • Matthew Robards
  • Peter Sunehag
چکیده

We introduce and empirically evaluate two novel online gradientbased reinforcement learning algorithms with function approximation – one model based, and the other model free. These algorithms come with the possibility of having non-squared loss functions which is novel in reinforcement learning, and seems to come with empirical advantages. We further extend a previous gradient based algorithm to the case of full control, by using generalized policy iteration. Theoretical properties of these algorithms are studied in a companion paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Imperialist Competitive Algorithm based on a new assimilation strategy

Meta-heuristic algorithms inspired by the natural processes are part of the optimization algorithms that they have been considered in recent years, such as genetic algorithm, particle swarm optimization, ant colony optimization, Firefly algorithm. Recently, a new kind of evolutionary algorithm has been proposed that it is inspired by the human sociopolitical evolution process. This new algorith...

متن کامل

Access and Mobility Policy Control at the Network Edge

The fifth generation (5G) system architecture is defined as service-based and the core network functions are described as sets of services accessible through application programming interfaces (API). One of the components of 5G is Multi-access Edge Computing (MEC) which provides the open access to radio network functions through API. Using the mobile edge API third party analytics applications ...

متن کامل

Challenges to Soil Erosion Control Measures among Farmers in Anambra State, Nigeria: Implications for Extension Policy

The study investigated challenges to soil erosion control measures among farmers in Anambra State, Nigeria. Purposive, multistage and random sampling techniques were employed in selecting a sample size of two hundred and forty (240) respondents. Structured interview schedule was used for data collection. Frequency counts, percentage, mean scores and factor analysis were used for data analysis. ...

متن کامل

The Factors Affecting on Banking Crisis Loss with Emphasis on Policy Frameworks

The main purpose of this study is to identify the determinants of banking crisis loss, the variables of policy framework especially, for 12 sample countries over the period 1980-2019. Accordingly, we extracted pre-crisis and post-crisis trends from countries' real GDPs and then calculated output loss for the crisis year and three years afterwards. In the following, we used the Poisson quasi-max...

متن کامل

The Effectiveness of Cognitive Rehabilitation on Increased Attention and Memory Functions in Heroin Addicts

Objective: The purpose of the present study was to determine the effectiveness of cognitive rehabilitation on attention and memory functions in heroin addicts. Method: The present study was a quasi-experimental study with pre-test and post-test with control group. The statistical population of the study consisted of all addicts of 3 addiction withdrawal clinics in Tehran in 2017. According to t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011